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Fuzzy  Order  Statistics  and  Their  Application 
to  Fuzzy  Clustering* 


Paul  R.  Kersten,  Member  IEEE 

Abstract  -  The  median  and  the  median  absolute  deviation  (MAD)  are  robust  statistics  based 
on  order  statistics.  Order  statistics  are  extended  to  fuzzy  sets  to  define  a  fuzzy  median  and 
a  fuzzy  MAD.  The  fuzzy  c-Means  (FCM)  clustering  algorithm  is  defined  for  any  p-norm 
(pFCM),  including  the  norm  (1FCM).  The  1FCM  clustering  algorithm  is  implemented 

via  the  alternating  optimization  (AO)  method  and  the  clustering  centers  are  shown  to  be 
the  fuzzy  median.  The  resulting  AO-1FCM  clustering  algorithm  is  called  the  fuzzy  c- 
Medians  (FCMED)  clustering  algorithm.  An  example  illustrates  the  robustness  of  the 
FCMED. 


I.  INTRODUCTION 

Robust  statistics  are  designed  to  be  resistant  to  outliers.  Two  examples  are  the  median  for 
estimating  the  center  of  the  data  and  the  median  of  the  absolute  deviations  from  the  median 
(MAD)  for  estimating  the  dispersion  of  the  data.  These  statistics  do  not  apply  directly  to  fuzzy 
sets  since  both  are  based  on  order  statistics,  which  implicitly  assume  the  data  belong  entirely  in 
one  set.  These  statistics  are  extended  to  apply  to  fuzzy  sets  and  then  used  to  implement  an  AO 
version  of  the  1  FCM  clustering  algorithm,  where  the  membership  functions  (MFs)  are  given  by 
[1]  and  the  cluster  centers  are  fuzzy  medians.  This  version  is  called  the  fuzzy  c-medians 
(FCMED)  clustering  algorithm  since  the  weighted  median  plays  the  same  role  as  the  weighted 
mean  in  the  FCM.  The  FCMED  algorithm  improves  clustering  on  outlier-ladent  data  sets,  where 
the  clusters  are  generated  by  heavy-tailed  distributions. 
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Warfare  Center.  Approved  for  public  release;  distribution  is  unlimited. 
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Fuzzy  medians  are  a  special  case  of  weighted  medians,  where  the  weights  associated  with  the 
data  points  may  be  interpreted  as  memberships.  According  to  Bloomfield  and  Steiger  [2], 
weighted  medians  were  first  named  by  Edgeworth  [3]  circa  1887.  The  1FCM  clustering 
algorithm  requires  the  minimization  of  a  functional  that  consists  of  the  weighted  sum  of  absolute 
differences  with  respect  to  the  clustering  center.  Jajuga  [4]  seems  to  be  the  first  to  have 
formulated  the  1FCM  minimization  as  a  regression  problem,  which  then  allowed  him  to  apply 
the  solution  found  in  [2]  attributed  to  Laplace  circa  1789.  The  optimal  cluster  center  is  the 
weighted  median,  although  Jajuga  [4-5]  does  not  seem  to  mention  that  his  solution  is  the 
weighted  median.  The  fuzzy  median  set  forth  in  this  paper  was  first  derived  by  the  author  in  [6- 
7]  and  used  to  independently  derive  the  1FCM  centering  statistic  [8].  The  weighted  median 
appears  in  numerous  applications.  For  example,  it  is  used  in  risk  management  [9]  and  image 
processing  [10].  In  regression,  the  weighted  median  provides  a  robust  slope  estimate  [11], 
Another  example  is  in  the  remedian  approximation  to  the  median  [12],  Fuzzy  clustering  using 
the  f,norm  is  not  new  and  has  been  researched  by  others  [1,13].  In  [13]  the  authors  use  a 

reformulated  version  of  the  FCM  and  apply  a  general  search  method  to  find  the  cluster  center 
and  memberships.  In  [1]  the  AO-1FCM  is  used,  where  the  memberships  are  solved  for  explicitly 
as  in  the  FCM  and  the  cluster  centers  are  determined  by  a  linear  programming  algorithm.  The  k- 
medoid  method  is  a  collection  of  algorithms  that  may  use  the  metric  and  could  include  a  k- 

median  hard  clustering  algorithm [14].  Unfortunately,  the  k-median  is  also  another  name  for  the 
k-medoid  method,  leading  to  some  confusion  [14,  p.  72]. 

This  paper  is  organized  as  follows:  Section  II  contains  a  definition  of  fuzzy  order  statistics  as 
well  as  the  extension  of  the  median  and  the  MAD  to  fuzzy  sets.  In  Section  III,  the  quantiles  are 
extended  to  fuzzy  sets.  The  FCMED  clustering  algorithm  is  presented  in  Section  IV.  The 
conclusions  are  contained  in  section  V. 
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II.  THE  FUZZY  MEDIAN  AND  THE  FUZZY  MAD 


Robust  statistics  are  resistant  to  outliers  because  they  are  designed  assuming  variations  to  the 
underlying  statistical  distribution  will  occur  [15-18].  Often,  a  robust  statistic  is  rated  by  its 
breakdown  point,  which  is  loosely  defined  as  the  fraction  of  outliers  that  must  be  present  before 
the  statistic  no  longer  provides  a  meaningful  estimate.  Just  as  the  median  is  a  robust  alternative 
to  the  mean,  the  MAD  is  the  robust  alternative  to  the  standard  deviation.  Both  statistics  have  a 
high  breakdown  point.  Throughout  this  section,  the  data  samples  are  assumed  to  be  one¬ 
dimensional. 

The  median  is  defined  on  the  data  set  X  =  [xvx2,...,xN },  where  each  element  is  a  real 

number  x^R.  The  ordered  N  -sample  is  denoted  by  { jca) ,  jc(2)  _ _ _  ) »  where 

x(1)  <x(2)  <...<x{N}  are  collectively  defined  as  the  order  statistics  [19,  p.  22],  Here,  the  median 
of  X  is  defined  to  be  x(M)  if  N  =  2l  +  1  and  to  be  [x(;) +x(/+1)]/2  if  N  =  21 .  The  median 

represents  the  halfway  point  of  the  samples,  having  an  equal  number  of  samples  smaller  and 
larger  than  itself.  Accordingly,  half  of  the  points  to  the  left  of  the  median  must  be  outliers  before 

the  median  is  pulled  toward  the  left,  which  explains  why  the  finite  breakdown  point  of  the 
median  is  one-half  [15].  For  vector  samples,  xk  e  Rp ,  p  >1 ,  the  definition  is  applied  to  each 

dimension  of  the  sample  and  the  median  vector  is  defined  to  be  the  vector  of  individual  medians. 

To  construct  the  MAD,  take  the  data  set  X  and  form  another  data  set  Y  = 
{|  x,  -med(X) |, ,|  xN  -med(X)  |] ,  find  the  median  of  Y  and  then  scale  it.  For  this  paper,  the 

MAD  is  defined  as  mad(X)  =  med(Y) / 0.6745 ,  where  the  constant  0.6745  adjusts  the  dispersion 

measure  to  be  1  when  the  sample  is  Gaussian  with  unit  variance.  Intuitively,  one  folds  the 
centered  data  {*,.  -med{X)}"={  about  0,  then  finds  the  median  of  the  set  of  positive  deviations 

from  the  median.  The  breakdown  point  of  the  MAD  is  also  one-half  [15,  pp.  105-107], 

The  median  and  the  MAD  are  defined  on  crisp  sets,  which  implicitly  assumes  that  each  data 
point  has  membership  1  in  the  set.  The  implicit  role  of  the  sample  memberships  is  evident  when 
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N 

the  median  m  is  defined  as  the  solution  of  min  P ri  (m)  =  min  V  |  xk  -m  |  [18,  pp.  233-234],  An 

meR  F  meR  ~ 

k= I 

informal  solution  is  found  by  taking  the  derivative  of  Pcrisp  ( m )  with  respect  to  m  and  setting  it 

N 

equal  to  zero  and  multiplying  though  by  -1  giving  ^sgn(xt  -m)  =  0.  If  /V  =  2/  + 1 ,  the  unique 

*= i 

solution  is  m  =  ;t(,+1)  and  if  N  =  21 ,  the  derivative  is  zero  for  any  me  (x(;),x(;+I)).  In  the  latter 

case,  the  root  m  is  not  unique,  but  is  made  so  by  arbitrarily  choosing  a  suitable  point  within  this 
interval,  e.g.,  the  average  of  x(/+l)  and  x(l) .  Strictly  speaking,  this  solution  is  not  proper  since  the 

derivative  of  \xk  -m\  at  xk  =m  does  not  exist;  however,  it  is  easily  repaired  [18,  p.234]. 
Following  [17],  define  m  =  (m*  +m**)/2  where  m*  =sup{m|PCT1.fp(m)>0}  and  m”  = 
inf[m  |  Pcrisp(m )  <  0} ,  so  that  one  avoids  the  problem  of  taking  the  derivative  at  the  jump  point. 

The  informal  solution  is  used  in  other  sections  because  it  is  shorter  and  easily  formalized. 

The  definition  of  fuzzy  order  statistics  requires  two  sequences  of  real  numbers:  the  data  X 
and  their  corresponding  memberships  U  =  {«, ,  u2 , . . . , uN } .  A  permutation  per{  1,2, . . . ,  N }  of  the 

integers  [1,2,..., N)  is  needed  to  order  X  .  The  fuzzy  order  statistics  are  collectively  defined  as 
x perm-xper(2)-'"-xper(N)  along  with  their  corresponding  memberships 

{uPer{i)’upera)'---’uper(N)}  •  Since  the  same  permutation  that  ordered  the  data  vector  X  is  applied 

to  U ,  the  association  of  data  point  to  its  membership  is  retained. 

The  functional  definition  of  the  median  generalizes  to  fuzzy  sets.  For  the  c  -class  problem,  if 
uik  is  the  membership  of  xk  in  class  i ,  then  m,  solves  the  minimization  of  this  weighted 

N 

objective  functional  min  P.  (mi )  =  min  V  uik  \xk  -m;\.  The  solution  mi  is  a  weighted  median 

tttjG  R  m,G  R  , 

‘  Jt — I 

applied  to  the  i  -  th  fuzzy  set  where  the  weights  are  found  in  the  definition  of  the  fuzzy  set 
X i  =  un  / +  un  / x2  + ...  +  uiN  / xN .  Here,  the  statistic  m.  is  called  the  fuzzy  median  of  the 
i—th  class.  The  derivative  of  -  P^  (mt )  with  respect  to  mt  is  given  by  4'^  {mi )  = 

N 

^uik  sgn(xk  -mf)  and  its  root  mt  is  the  fuzzy  median.  When  the  root  is  not  unique,  it  is  made 
*=< 

so  by  averaging  the  domain  values  where  the  derivative  is  zero.  So  the  fuzzy  median  is  a 
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weighted  median  where  the  weights  are  the  membership  of  the  sample  points  in  the  fuzzy  set. 
This  statistic  reduces  to  the  median  when  the  weights  are  equally  likely. 

As  an  example  of  the  fuzzy  median,  consider  a  small  one-class  data  set  with  its  associated 
membership  vector  given  in  Table  1  and  the  plots  of,  the  P  and  functions  in  Figure  1. 


m,  The  Fuzzy  Median 

Figure  1.  RHO  (P )  and  PSI  ( Y )  functionals  showing  the  fuzzy  median  value. 

TABLE  I 


Example  Data  Set  X  With  Corresponding  Membership  Values. 


X 

1.0 

2.0 

3.0 

4.0 

5.0 

6.0 

7.0 

8.0 

u 

0.2 

0.4 

0.1 

0.6 

0.9 

0.7 

0.3 

0.4 

The  fuzzy  median  is  5.0,  which  is  not  equal  to  the  classical  median  of  4.5.  In  this  example  the 
fuzzy  median  is  unique  because  the  root  is  unique. 


5 


The  MAD  can  also  be  reformulated  into  the  functional  form  minJ>J|;ci  that  is 

minimized  with  respect  to  r\  and  the  resulting  statistic  defined  as  mad  =  77/0.6745 .  The  mad 

estimator  requires  the  median  m  be  known  beforehand.  For  a  fuzzy  data  set,  the  median  does 
not  exist;  however,  the  fuzzy  median  does.  For  the  i-th  fuzzy  data  set  Xn  one  can  define  the 

N 

fuzzy  MAD  in  terms  of  the  fuzzy  median  m(  and  the  functional  min^MiA||;icJ(.  -mI|-771|.  The 

fuzzy  MAD  is  given  by  fuzmadi  =  r]i  /  0.6745 .  From  an  implementation  point  of  view,  one  first 
forms  the  fuzzy  median  mi ,  uses  this  to  construct  a  new  fuzzy  data  set  Yt  =  un  / 1  jc,  —  mf  [  4- 
unl\x2-mi  |  +...  +  ww  /|  xN  -mi  |,  finds  the  fuzzy  median  on  this  set,  then  scales  it.  For  the 

example  in  Table  1,  the  MAD  is  2.0  whereas  the  fuzzy  MAD  is  1.48,  since  the  membership  is 
highest  around  the  central  values  of  the  sample.  For  a  p-dimensional  data,  one  applies  it  on  each 
component  separately. 

Although  defining  the  fuzzy  median  and  fuzzy  MAD  is  only  a  simple  modification  of  the 
crisp  statistics,  it  allows  the  important  application  of  robust  statistics  to  fuzzy  sets.  The  median  is 
also  a  Huber  M-estimator  [16]  implicitly  defined  via  functionals  of  the  form  P{m)  = 

N 

^  p(xk  -  m) ,  where  p  satisfies  certain  boundary,  symmetry,  and  non-negativity  conditions. 

*=i 

Because  these  summands  can  be  weighted  with  the  appropriate  sample  memberships,  this  whole 
class  of  M-estimators  applies  to  fuzzy  algorithm  development. 

III.  THE  FUZZY  QUANTILES 


In  this  section,  the  weighted  M-estimator  functionals  are  applied  to  derive  fuzzy  quantiles  by 

defining  p{x )  asymmetrically.  First,  the  crisp  quantiles  are  redefined  using  P(m ) .  Define 

\  px,  ifx  >  0 
I  -  qx,  otherwise 
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where  for  the  sake  of  definiteness  it  is  assumed  that  p  +  q  =  1,  then  the  p-th  quantile  is  the 

N 

value  of  m  that  minimizes  P{m)-^jp{xk  -m).  The  minimum  of  P(m)  is  found  by  taking 


*=i 


N. 

derivatives.  Define  x¥(m)  =  -P\m)  =  ^p/(xk  -m)  ~^ji//(xk -m)  where  p'(x)  is  defined  in 

*=i  *=i 


terms  of  indicator  functions 

T  /  \  [  A 

^W=i  0,  if**  A 

Then  V (x)  =  p'(x)  =  pi [x>0]  +  \(p  - q)I {x=0]  -  ql [x<QV  is  a  step  function  located  at  x  =  0.  If 

p  +  q  =  1,  the  jump  size  is  1,  going  from  -q  to  +p  at  x  =  0.  '{'(m)  is  a  monotone  non¬ 
increasing  function  that  starts  at  pN  when  m  <  x(1)  and  reaches  -  qN  when  m  >  x(N) . 
Intuitively,  this  is  a  method  of  counting  since  at  each  data  point  xk ,  as  one  moves  from  left-to- 
right  on  the  real  number  line,  the  functional  'F(m)  decreases  by  one.  So,  if  p  =  q  =  1/  2 ,  then  the 
root  of  the  equation  vF(m)  =  0  occurs  when  half  of  the  points  are  to  the  left  of  m  and  half  are  to 
the  right.  When  p  =  1/4,  then  pN  =  N/4  and  -qN~-3N/4,  the  root  m  occurs  where  one- 
quarter  of  the  samples  are  to  the  left  and  three-quarters  are  to  the  right;  that  is,  m  is  the  first 
sample  quartile,  ignoring  the  uniqueness  of  the  root.  For  large  N  ,  it  can  be  shown  [19,  p.36]  that 

if  X  consists  of  independent  and  identically  distributed  random  variables  (iidrv’s)  with 
distribution  Fx  ,  then  E(x{r))  ~  Fx\r/(N  +  1)) .  If  r  =  pN ,  then  E(x{r))  is  approximately  equal  to 

the  p-th  quantile  or  for  large  N ,  the  sample  quantile  approaches  the  population  quantile. 

The  fuzzy  quantiles  are  defined  by  modifying  the  functional  P(m)  to  be  Pf  {mi )  = 

N 

min YJuikp(xk-mi)  and  then  with  'F ,  (mi )  =  (m,. )  solving  the  equation  'F-  (m)  = 

WIG  R 


*=1 


\uikif/{xk  —  m,.)  =  0.  Here  mi  is  the  p-th  quantile  of  the  i-th  set.  When  p  =  q  =  1/2 ,  then  the 

4=i 

root  of  the  equation  ^ fuzzy  {mi )  =  0  still  occurs  when  the  “number  of  points”  to  the  left  of  mi 

equals  the  “number  of  points”  to  the  right.  But  in  this  new  context,  the  “number  of  points”  is 
interpreted  to  be  the  ”, sum  of  their  fuzzy  cardinality.  ”  The  fuzzy  cardinality  of  the  points  in  the 

N  N 

i-th  fuzzy  set  is  defined  as  Nt  =^uik  =^Jui(xk)  where  the  total  number  of  samples  is 


4=1 


4=1 
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given  by  N  =  ^JNi .  So  to  find  the  fuzzy  median  in  the  set  of  ordered  points  {xper(i)  jf .  sum  the 

1=1 

corresponding  memberships  {tper(i)  from  left  to  right  until  half  of  the  fuzzy  cardinality  is  to  the 

left  of  median  and  half  is  to  the  right.  In  like  manner,  when  finding  the  first  fuzzy  quartile,  one- 
fourth  of  the  fuzzy  cardinality  should  be  to  the  left  of  the  point  and  three-fourths  should  be  to  the 

right.  Viewed  in  this  manner,  fuzzy  quantiles  possess  the  same  strong  intuitive  appeal  as  their 
crisp  counterparts.  For  the  same  data  given  in  Table  1.  Here  Nt  is  3.6  so  pNt  =0.9  and 
-  qNi  =  -2.7 .  Again,  the  fuzzy  sample-quartile  value  of  4  is  not  the  same  as  the  crisp  sample- 
quartile  value  of  2,  if  the  definition  of  p-th  sample  quantile  [19,  p.  41]  is  X(r),  r  =  fVp].  To 

maintain  consistency  with  the  fuzzy  definition  of  quantile,  the  sample  quantile  convention 
adopted  here  is  X(r),  if  pN  is  not  an  integer,  and  [X{r)  +Z(r+1)]/2  if  pN  is  an  integer.  Then  if 

the  data  memberships  are  all  1 .0,  the  fuzzy  quartile  and  the  crisp  quartile  will  coincide  and  for  the 
example  of  Table  1,  the  quartile  will  be  2.5. 

IV.  FUZZY  CLUSTERING 

The  FCMED  clustering  algorithm  is  presented  after  first  stating  the  FCM  clustering 
algorithm.  As  with  the  FCM,  the  FCMED  algorithm  obtains  by  first  minimizing  the  objective 
functional  with  respect  to  the  MFs  and  then  with  respect  to  the  centering  statistic.  The  MFs  for 
the  FCMED  are  stated  and  the  centering  statistic  shown  to  be  the  fuzzy  median. 

A.  Fuzzy  c-means  (FCM)  clustering  algorithm 

The  FCM  is  a  practical  clustering  algorithm  that  generalizes  the  crisp  c-means  algorithm  [20- 
21].  It  generalizes  by  replacing  the  class  assignment  with  a  membership  vector  whose  elements 
represent  the  membership  of  the  data  points  in  each  of  the  classes.  The  algorithm  produces  a 
fuzzy  partition  of  the  data  into  c  classes,  i.e.,  each  point  has  a  membership  vector  or  a  fuzzy  unit 
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vector  (fit  vector)  associated  with  it,  rather  than  a  single  class  assignment.  The  algorithm  is  an 

unsupervised  learning  technique.  The  following  description  of  the  FCM  is  based  on  [20]. 

Consider  N  data  samples  forming  the  data  set  denoted  by  X  =  {xl,x2,...,xN],  where  each 

sample  x(  e  Rp  is  a  p-dimensional  real  vector.  Assume  there  are  c  classes  and  uik  = 
Uj(xk)e  [0,1]  is  the  membership  of  the  k  -  th  sample  in  the  i  -  th  class.  Each  sample  point 

c 

xk  satisfies  the  constraint  that  ^  uik  = 1 .  The  set  of  exemplars  or  prototypes  for  the  c  clusters  is 

i=i 

given  by  v  =  (v1,v2,...,vc).  The  FCM  algorithm  minimizes  the  functional 

=  where  dik  =  ||v,.-xj2 

*=i  i=i 

subject  to  the  above  constraint.  The  AO  method  is  one  technique  to  achieve  the  minimum.  The 
power  mc  of  the  membership  is  called  the  weighting  exponent.  Using  the  memberships  U ,  class 

exemplars  are  calculated  from  the  data  points.  The  class  exemplars  are  then  used  to  calculate 
new  memberships.  This  procedure  is  repeated  until  some  form  of  convergence  occurs.  A  detailed 
version  of  this  algorithm  is  given  in  [20,  p.  66].  The  FCM  exemplars  are  linear  statistics  or 
weighted  averages  of  the  data  points  where  the  weights  are  scaled  versions  of  the  memberships. 
Unfortunately,  linear  statistics  are  known  to  be  vulnerable  to  outliers  [22]. 

B.  Fuzzy  c-Medians  (FCMED)  clustering  algorithm 

For  the  FCMED,  the  tx  objective  functional  is  [9]: 

J(U,v)  =  where  dik  =||v;  -xj,  =  £|**0')- v,(y)| 

*=1  i=l  j=\ 

where  is  the  £x  metric  that  is  used  throughout  this  subsection.  Following  [20,  pp.  65-69],  the 
derivation  for  the  weight  uik  carries  through  with  d~k  replaced  by  dik .  The  optimal  memberships 
are  then  given  by: 
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“«  =  !/ 


2 

y=i 


V  Jk  J 


for  the  samples  that  do  not  fall  on  the  exemplars  [1,  p.  547].  Samples  that  that  fall  too  close  to 
exemplars  are  handled  in  the  same  way  as  with  the  FCM  [20].  When  the  optimum  exemplars  are 
sought,  one  is  interested  in  minimizing  J(U,v)  with  respect  to  v  and  in  this  case,  one 

N  c  p 

minimizes  J(U,v)  =  22«?2I*  k  0)  -  v(-  (y')|  by  first  rewriting  it  as 

*=i  ,=i  j= i 

J(U,v)  =  'ZfJJ(.U,viU),iJ)  where  )  =  'jtu;<\xkU)-viU)\  ■ 

i=i  j=\  *=i 

The  functional  is  separable  in  j  (the  dimension)  and  i  (the  class)  since  each  of  the  functions 
7(C7,  7)  in  the  objective  functional  J(U,v )  is  a  function  of  only  one  variable  v,.(y)  [23, 

p.8].  Hence,  one  minimizes  J(U,v )  by  minimizing  each  component  /(t/,v(  (y'),/,y)  separately. 
For  each  class  i  and  coordinate  j ,  the  minimum  of  J(U,Vi(j),i,j )  with  respect  to  v.(y)  is  the 
fuzzy  median.  Jajuga  [4]  also  argues  that  J(U,v)  is  separable  because  J(U,vi(j),i,j)  contains 
only  one  unknown  v(.  (j) .  In  section  II,  the  fuzzy  median  (weighted  median)  with  memberships 
(weights)  for  the  i-th  class  and  the  j-th  coordinate  was  shown  to  minimize 
Doing  this  for  each  coordinate  j  =  l,...,p  gives  the  centering  vector  v(.  for 

class  i .  Repeating  this  for  each  class  i ,  one  produces  the  cluster  exemplars  v  that  minimizes 
J(U,v )  with  respect  to  v.  The  fact  that  the  fuzzy  median  (weighed  median)  is  the  optimal 

centering  statistic  for  the  AO- 1  FCM  does  not  seem  to  be  widely  known.  Although  the  objective 
functional  can  be  minimized  by  a  general  optimization  procedure  [1],  the  fuzzy  median  makes 
the  AO  scheme  more  intuitive.  When  the  cluster  distributions  are  light-tailed,  say  Gaussian,  then 
the  asymptotic  relative  efficiency  of  the  mean  with  respect  to  the  median  suggest  that  the  FCM 
should  do  better  than  the  FCMED  [19,  p.283].  Here,  the  greatest  concern  is  outliers,  so 
estimation  efficiency  of  the  centering  statistic  is  not  addressed. 
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To  compare  the  FCMED  algorithm  to  the  FCM,  the  FCMED  was  tested  on  both  Gaussian 
and  Cauchy  samples.  As  one  expects,  the  FCMED  exemplar  trajectories  for  the  Gaussian 
clusters  are  quite  similar  to  the  FCM  trajectories.  However,  for  the  Cauchy  sample,  the  FCM 

does  not  converge  to  the  cluster  centers,  while  the  FCMED  does.  Figure  2  illustrates  the  FCMED 
applied  to  the  two-dimensional  Cauchy  antipodal  clusters  located  at  [±1.27,0],  mc  =1.25.  The 

exemplars  were  initialized  as  [0,±1] . 


Figure  2  FCMED  exemplar  traces  for  two  Cauchy  clusters 
located  at  antipodal  positions,  (outliers  not  shown  due  to  scale.) 

The  FCMED  algorithm  has  the  same  algorithmic  structure  as  the  FCM  with  the  AO  method. 

The  FCMED  algorithm  follows: 

1 .  Fix  c ,  the  number  of  classes  such  that  c  e  {2, . . . ,  N  - 1}. 

Choose  the  C,  metric  in  Rp  and  fix  the  weighting  exponent  mc  e  (1,°°] . 

Initialize  the  membership  matrix  denoted  by  C/(0> 

2.  Construct  the  c  exemplars  v(  for  t  =  {l,...,c}  by  finding  the  fuzzy  median  with 
memberships  u ”kc  for  each  class.  Each  class  exemplar  vf  is  p-dimensional  so  v,  (y)  must 
be  found  for  each  j  =  {l,...,p},  using  just  the  j  -  th  component  of  xk . 

3.  Update  the  memberships  uik  in  the  membership  matrix  with 
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i/K-i)" 

,  provided  of  course  that  none  of  the  djk  are  zero. 

In  the  latter  case,  the  uik  are  assigned  as  they  are  in  the  FCM  algorithm  [20,  p.  66]. 

4.  Compare  the  last  two  membership  matrices,  U(l)  and  Ua+l) .  When  they  are  sufficiently 
close,  terminate  the  algorithm;  otherwise,  return  to  step  2. 

Note  the  strong  structural  similarity  of  the  FCMED  and  FCM  algorithms.  The  fuzzy  median  may 

be  calculated  by  sorting  the  sample  values.  In  this  case,  the  time  complexity  for  each  exemplar 
v,.  is  easily  shown  to  be  O(pNlogN) ,  since  for  each  of  the  p-dimensions  of  the  sample  vectors 

it  takes  0( N  log  N)  operations  to  sort  the  data.  There  are  c  classes  so  the  time  complexity  for 
Step  2  is  0{cpN  logiV) .  The  space  complexity  is  0(N) ,  which  for  large  data  sets  like  images 
can  be  quite  onerous.  More  refined  algorithms  for  calculating  the  weighted  median  can  reduce 
the  time  complexity  [24,  p.  193]  and  approximations  to  the  fuzzy  median  can  reduce  the  space 
complexity  [25].  A  heavy  computational  price  is  paid  to  replace  the  FCM  with  the  FCMED. 

V.  CONCLUSIONS 

The  fuzzy  median  was  defined  and  shown  to  be  weighted  median  where  the  weights  may  be 
interpreted  as  memberships.  Functional  definitions  of  the  median  and  the  MAD  provided  the 
formulation  to  extend  these  statistics  to  fuzzy  sets.  By  weighting  the  functionals  with  the 
memberships,  both  statistics  naturally  extend  to  fuzzy  data  sets.  The  quantiles  were  extended  to 
the  fuzzy  data  sets  using  the  same  approach  of  explicitly  weighting  the  defining  functionals.  The 
intuitive  appeal  of  the  fuzzy  quantiles  is  retained  by  interpreting  counting  as  summing 
memberships.  The  fuzzy  median  and  the  fuzzy  quartile  were  illustrated  in  separate  examples. 
The  AO- 1  FCM  is  a  special  case  of  the  FCM  clustering  algorithm  that  uses  an  alternating 
optimization  method  and  the  lx  norm.  In  this  case,  the  cluster  exemplars  are  shown  to  be  the 

fuzzy  medians  and  the  resulting  algorithm  called  the  fuzzy  c -medians  (FCMED)  clustering 
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algorithm  because  of  its  strong  similarity  to  the  FCM.  This  fuzzy  median  was  linked  to  Jajuga’s 
solution  formulation  for  the  cluster  exemplars  as  a  regression  problem,  which  yielded  the 
weighted  median  as  the  cluster  center  via  the  work  of  Laplace.  The  FCM  and  the  FCMED 
clustering  algorithms  have  similar  performance  for  lightrtailed  clusters,  but  quite  dissimilar 
performance  on  heavy-tailed  clusters.  Both  algorithms  quickly  converge  when  the  data  is  light¬ 
tailed  and  the  number  of  cluster  is  fixed.  Outliers  or  heavy-tailed  clusters  that  cause  convergence 
problems  for  the  FCM,  are  better  handled  by  the  FCMED.  When  the  data  is  unknown  or  not  well 
behaved,  the  FCMED  is  a  robust  alternative  to  the  FCM  with  a  heavy  computational  penalty. 
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